NoSQL Cheatsheet

NoSQL Concepts Overview

Concept Description
NoSQL Refers to non-relational databases that can handle large volumes of rapidly changing data and scale out horizontally. They often relax strict ACID properties for performance and flexibility.
CAP Theorem States that a distributed system can only guarantee two of the following simultaneously: Consistency, Availability, and Partition Tolerance. Different NoSQL databases choose different trade-offs.
BASE “Basically Available, Soft state, Eventually consistent.” Many NoSQL databases follow BASE principles instead of strict ACID transactions to optimize for performance and scalability.
Horizontal Scalability NoSQL databases typically scale by adding more nodes (sharding/partitioning data), rather than vertically scaling a single server.
Schema Flexibility Most NoSQL databases do not enforce a rigid schema. The data model can evolve more easily as requirements change.

Key-Value Databases

Concept Description Schema Example
Definition Stores data in a simple key-value pair. Good for caching and real-time applications with simple lookups. Examples: Redis, Memcached. Key: "user:1001"
Value: "John Doe"
Typical Use Cases Session management, caching, leaderboard counts, token storage, quick retrieval by key. Example:
Key: "session:abc123"
Value: "{'user_id': 1001, 'expires': '2025-03-09T15:00:00'}"
Common Commands Set, Get, Delete by key. Redis Example:
SET user:1001 "John Doe"
GET user:1001

Document Databases

Concept Description Schema Example
Definition Stores data as documents (usually JSON or BSON). Offers flexible schema and advanced querying. Examples: MongoDB, CouchDB, Firestore. MongoDB Document Example:
{ "_id": 1001, "name": "John Doe", "email": "john@example.com", "orders": [ { "order_id": 500, "total": 89.99 } ] }
Typical Use Cases Content management systems, user profiles, event logging, any scenario requiring flexible data structures. Collection: "users"
Documents: represent individual user profiles, each can have different fields if needed.
Common Commands Insert, Find, Update, Delete (typical CRUD operations). Also supports indexing for fields. MongoDB Example:
db.users.insertOne({ _id: 1001, name: "John Doe" })
db.users.find({ _id: 1001 })
Query Flexibility Can query nested fields, arrays, and perform aggregations. Supports advanced operators like $in, $lt, $regex, etc. Aggregation Example:
db.orders.aggregate([ { $match: { status: "shipped" } }, { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } } ])

Column-Family Databases

Concept Description Schema Example
Definition Organize data into column families, which contain rows that can have varying columns. Optimized for reading/writing large volumes of data. Examples: Cassandra, HBase. Cassandra Table Example:
Keyspace: my_app
Table: users
Primary key: (user_id)

CREATE TABLE users ( user_id int, name text, email text, PRIMARY KEY (user_id) );
Typical Use Cases High write throughput, large-scale analytics, time-series data, event logging with predictable query patterns. Example:
For storing sensor readings, each row can have columns for each timestamp.
Common Commands CQL (Cassandra Query Language) is similar to SQL. You define tables, insert/update data, use partition keys, clustering keys, etc. Cassandra Example:
INSERT INTO users (user_id, name, email) VALUES (1001, 'John Doe', 'john@example.com');
Partitioning Data is distributed across the cluster using partition keys for horizontal scalability. Careful design of partition keys is crucial for performance. Partition Key Example:
PRIMARY KEY ((user_id), some_other_key)

Graph Databases

Concept Description Schema Example
Definition Designed to store data in nodes and relationships (edges). Excellent for highly interconnected data. Examples: Neo4j, JanusGraph. Neo4j Schema Example:
Nodes: (Person { name: "John", age: 30 })
Relationship: (John)-[:KNOWS]->(Jane)
Typical Use Cases Social networks, recommendation engines, fraud detection, network topologies, or anything requiring graph traversal. Example: A "Friend of a Friend" search or shortest path between entities.
Common Commands / Query Language Cypher (Neo4j), Gremlin (Apache TinkerPop). Queries use pattern matching on node labels and relationships. Neo4j Cypher Query Example:
MATCH (p:Person)-[:KNOWS]->(friend:Person) WHERE p.name = "John" RETURN friend;

General Best Practices

Practice Description
Know Your Access Patterns Design your schema (or data model) around how data is queried. This is critical in NoSQL to optimize performance.
Use Indexes Wisely Indexes speed up reads but can slow writes and use additional memory/disk. Only index what you really need.
Partition / Shard Carefully Even data distribution across clusters is important for performance. Avoid hotspots by choosing keys that won’t concentrate loads on a single node.
Monitor and Tune Monitor query performance, resource usage, and replication lag. Tweak configurations (e.g., read/write consistency levels) for your workload.
Security and Backups Enable authentication/authorization, encrypt data at rest and in transit, and have a reliable backup/recovery strategy.